A Novel Neural Machine Translation Approach for low-resource Sanskrit-Hindi Language pair

نویسندگان

چکیده

Sanskrit is one of the earliest native languages and correctly described as "the gods' language" because its wide use in Indian religious literature from past. However, it becoming less popular modern India. Due significant part to need for more materials translation both out Sanskrit, no longer commonly utilized. This study explores feasibility using machine (MT) provide a link between and, languages, contemporary descendant Hindi. A was conducted existing modelling methodologies, notably Statistical (SMT), proposed novel deep learning-based Machine strategy manually created parallel corpus Sanskrit-Hindi language pair. While SMT creates interpretations by mapping phrases source destination, statistical models, bilingual text corpora learning parameters, neural (NMT) frequently models entire single integrated model, convolutional network calculate probability word sequence. The NMT model implemented an encoder-decoder with attention mechanism paradigm inclusion gated recurrent units. Our approach involved development creation evaluated on automated human-based metrics, results show that our outperforms techniques Moses, surpassing them BLEU score 53.8% compared 34.56%. article examines undiscovered area Hindi discusses main benefits drawbacks while providing fresh viewpoint subject.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural machine translation for low-resource languages

Neural machine translation (NMT) approaches have improved the state of the art in many machine translation settings over the last couple of years, but they require large amounts of training data to produce sensible output. We demonstrate that NMT can be used for low-resource languages as well, by introducing more local dependencies and using word alignments to learn sentence reordering during t...

متن کامل

Translation Divergence in English-Sanskrit-Hindi Language Pairs

The development of a machine translation system needs that we identify the patterns of divergence between two languages. Though a number of MT developers have given attention to this problem, it is difficult to derive general strategies which can be used for any language pair. Therefore, further exploration is always needed to identify different sources of translation divergence in different pa...

متن کامل

Phrase Pair Mappings for Hindi-English Statistical Machine Translation

In this paper, we present our work on the creation of lexical resources for the Machine Translation between English and Hindi. We describes the development of phrase pair mappings for our experiments and the comparative performance evaluation between different trained models on top of the baseline Statistical Machine Translation system. We focused on augmenting the parallel corpus with more voc...

متن کامل

Xhosa-English Machine Translation: Working with a Low-Resource Language

This report details the author’s experiences as a Distributed Research Experience for Undergraduates (DREU) summer research intern at Carnegie Mellon University’s Language Technologies Institute. Under the guidance of Prof. Carolyn Rosé, the author attempted to implement a phrase-based translation (i.e., statistical machine translation, or SMT) system for translating Xhosa text into English usi...

متن کامل

Multilingual Neural Machine Translation for Low Resource Languages

Neural Machine Translation (NMT) has been shown to be more effective in translation tasks compared to the Phrase-Based Statistical Machine Translation (PBMT). However, NMT systems are limited in translating low-resource languages (LRL), due to the fact that neural methods require a large amount of parallel data to learn effective mappings between languages. In this work we show how so-called mu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2023

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3591207